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Main rdoa 



Schama Mapping as Query Discovery 

1 . Oescrto© your invcaUon, staling ihe problem sorveij (rf appropriate), and indicating the advantages of 
using the invention, 

Modom d*ta inten$iv« applications including data warehousing, global infonnation systems and alectronfc 
commerce, requira th* Interoperatlon of several hetsrogeneoua components, each having its own 
indiviAjal rapresentation of data. To enable these applications to d«dl wrth this heterogeneity, we must 
solve the schema ntapping prc^hru in which a $ourc9 (legacy) data representation is rnapped into a 
different, but fixed, target schema. Schema mapping invofves the discovery of a query or se: of queries 
that transform the source data into the new structure. We introduce an interactive mapping creation 
paradigm that relies on the use of y^L*9 con-Qspondonces that show how a value of a target attribute can 
be created from a set of values of source attn'butes. We have implemented this mapping creation 
paradigm in Oio, a prototype tool for semi-automated schema mapping. This df^losure cfaJma the 
incremontal Mi^rithm for schema mapping at the heart of Clio as a new Invention. 

Two clear advantages of usfng this algorithm for schema mapping are: 

1. Ease of use: The user works witn relationships t>etween individual source and target attribute 
values and lets C/ro generate the possibly complex SQL statements that realize the mapping. 

2, Generality & Power: The algorithm handles =omplex mappings involving joins, aggregates, 
nesting, and set-thooretii? operations like union, intersection, and difference. 

2. How does the Invention solve the problem or achieve an sdvantagft,(a description of "the invention" 
Including figures inline as appropriate)? 



paperpd 

We present an algorithm that gurdes a user through the process of mapping a source schema into a target 
scnenra. The details Of the algorithm can be found in Section 4 of the attached paper. We summarize the 

algorithnr- here. 

We claim the incremental algorithm al the core ol Qio as a new mechanism to guide users through the 
process of schema mapping. This algorrthm taices as input a $et of va/ue corr^pondenoes and pnjduces 
as output a viewdeflnition that expresses the target schema as a functbn of the source schema. A value 
correspondence is a function defining how a value (or combination of values) from a source database can 
be combined to form a value in the target For example, a string concatenaifon function can ba used lo 
indicate that a value of a staff-Id anribute of a target schema is formed by concatanating the letter *E' to an 
employee number from the »ourca. VaJue Gorreajp_Qndences may bo entered by fhm ur^ Lo^r nay be 
^gigstfifLhi^LCjio through somedi scovery proce ss. Because value correspondences pertai^Ttoa single 
target attrihure, they are sITfipmorusers to understand and construct, as opposed to manually 
consrructmg SQL views. 

Given a target relation and a set of value cwsspondences that define how ttw values of that target 
relation are constructed from values in some source relations, the algorithm perfcrms the lollo'Mng steps: 

^"9* ^ PrirlBd 0r/25/2CCD »: 0^:27:28 PM 
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1. The value caiT«spond»n<:«s are drvkjed fnto potentiai candidate sets. Each powntial candfdate 
set represents a singte way ot mapping the attributes in a target relation. By definitbn there is at most one 
value correspondence per anf ibute of the target relation In t potential candidate set* 

2. Potential candidate sets are examined to see if a join condition is needed, (f the value 

correspondences in a set map SOurcO VWiIuRW ^ram SAvprAl annmp rol^ttinns, a jnin rr»nHitinn (3 rnnrtA^lAm 

between the source relations will need to be discoverod. Potential candidate sets for which a jcm 
sonditiQo can be found are now called candidate ^ets. 

3. Candidate sets are ranked and combined into covers for the target refation. A cover is a subset 
of candidate sets such that every value correspondence that maps an ati/ibuie of the target nelatiort 
appears at feast onco in that subset. If multiple covers exists, they are ranked and presented to the user 
for evaluation. 

4. The selected cover is converted to a query view definition (currently, a SOL viavt^) that can be 
u$ed to populate the target relatbn. 

._An jpccftmentab ^rsion of the algprithnn (which Is what rs ji ppiorrtgn ied in C//o) uses the same ste ps. This 
irteractK/e arid in^en^tai aigorithnn takes as Input the currently seleci'ed cover and a single moarfication 
to the set of value correspondences (e.g,. ^ new valuo correspondence or the deletion of an ojristing value 
correspondence). The result of a single iteration of the incrennental algorithm i» a new cover that includes 
the incTQmentat rnodlfication. A set of heuristics, described in the attached paper, guide the search for 
new cover$ in what is, in the worst case, an exponential search. 

We cteinn the following inventions: 

1. An interactive algorithm that guides the user towards the (Tiost ffiPce/y schema mapping using 
simp Id value correspondences. 

2. A division cf the process into four interactive steps. 

5. A set ol heuristics thai guide ranking the results of each step. This ranking of results helps guide 
the search tor the most I3<sly nnapping inside an exponentially large set ot possible maf^ings, 

4, An incremental version of th^ algorithm. 

5. A schema rnapping tccl that can handle value correspondences taking as input values from 
multiple sources and with value mappings that provide different definitions for the sam^ target value. 




3. If thesem© advantage or prcbieni has been identiflec' by others (in side/outs We IBM), hov^' nave those 
others solved It and does your solution differ and why is it Oele.'^ 

Related work falts ini^^mejor classes: schema inregration and data transformation tools. Schema 
integration research i^^uses on the proWem of converging tv;o or more distinct source schennas and does 
lot typically worry abcut creating mappings from the source schemas to the converged schema (see 
aciAChed paper for references). Few commercial schema integration tools exists. Evoke 's M/gratlcn 
ArcftftectTM (www.evokesoff.com) is one. Migrator) Archit&ctThA autcmatically collects dependency 
Information from tegacy data sources and guides the user towards the constnjc^on of a normalized taxge: 
{retaticnal) schema. Mappings between source and target schenas are tracked as the normalized target 
scherra 1$ created. Because the target schema is cf^^xe<i from the source, mappings are typically quite 
simple. 

Data transformatfan tools ailow users to specify correspondences between a source and a target schema. 
Typically, these tools generate programs (c«ie) to capture ;he needed transformations and, either restrict 
the user to correspondences between a singe source and a single target, or, if multiple sources arg 
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APPENDIX C - RELATED PROCEEDINGS 

None (this sheet made necessary by 69 Fed. Reg. 155 (August 2004). page 49978.) 
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